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Abstract 



The computation of the Tukey depth, also called halfspace depth, is very demanding, even in low 
dimensional spaces, because it requires the consideration of all possible one-dimensional projections. 
In this paper we propose a random depth which approximates the Tukey depth. It only takes into 
account a finite number of one-dimensional projections which are chosen at random. Thus, this 
random depth requires a very small computation time even in high dimensional spaces. Moreover, it 
is easily extended to cover the functional framework. 

We present some simulations indicating how many projections should be considered depending 
I ^ I on the sample size and on the dimension of the sample space. We also compare this depth with 

some others proposed in the literature. It is noteworthy that the random depth, based on a very low 
T-H number of projections, obtains results very similar to those obtained with other depths. 
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^ 1 Introduction 

This paper is written in the same spirit as [9]. In the abstract of this paper, D.J. Hand 
states that "...simple methods typically yield performance almost as good as more sophis- 
ticated methods to the extent that the difference in performance may be swamped by other 
sources of uncertainty...". Hand's work is related to classification techniques. Here we 
analyze a conceptually simple and easy to compute multidimensional depth that can be 
applied to functional problems and that provides results comparable to those obtained 
with more involved depths. 

*Research partially supported by tiie Spanish Ministerio de Ciencia y Tecnologi'a, grant MTM2005-08519-C02-02 and 
the Consejen'a de Educacidn y Cultura de la Junta de Castilla y Leon, grant PAPIJCL VA102/06. 
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Depths are intended to order a given set in the sense that if a datum is moved toward 
the center of the data cloud, then its depth increases and if the datum is moved toward 
the outside, then its depth decreases. 

More generally, given a probability distribution P defined in a multidimensional (or 
even infinite-dimensional) space X, a depth tries to order the points in X from the "center 
(of P)" to the "outward (of P)". Obviously, this problem includes data sets if we consider 
P as the empirical distribution associated to the data set at hand. Thus, in what follows, 
we will always refer to the depth associated to a probability distribution P. 

In the one-dimensional case, it is reasonable to order the points using the order induced 
by the function 

X — >• Di{x, P) := min{P(— oo, x], P[x, oo)}. (1) 

Thus, the points are ordered following the decreasing order of the absolute values of the 
differences between their percentiles and 50, and the deepest points are the medians of P. 

Several multidimensional depths have been proposed (see, for instance, the recent book 
[lOj ) but here we are mainly interested in the Tukey (or half space) depth (see [IT]). If 
X e then, the Tukey depth of x with respect to P, Dt{x, P), is the minimal probability 
which can be attained in the closed halfspaces containing x. According to , this depth 
behaves very well in comparison with various competitors. 

An equivalent definition of Dt{x,P) is the following. Given v G MF, let 11^, be the 
projection of MJ' on the one dimensional subspace generated by v. Thus, P o is the 
marginal of P on this subspace, and it is obvious that 

Dt{x, P) = inf{Di(n„(x), P o n;i) : v G W}, x G (2) 

I.e., Dt{x, P) is the infimum of all possible one-dimensional depths of the one-dimensional 
projections of x, where those depths are computed with respect to the corresponding (one- 
dimensional) marginals of P. Some other depths based on the consideration of all possible 
one-dimensional projections, but replacing Di{x,P) by some other function, have been 
proposed (see, for instance, [IS]). We consider that what follows could be applied to all 
of them, but, we have chosen the Tukey depth to test it concretely. 

Perhaps the most important drawback of the Tukey depth is the required computa- 
tional time. This time is more or less reasonable if p = 2, but it becomes prohibitive 
even for p = 8 [151 P^-S- 54]. To reduce the time, in (page 2234) it is proposed to 
approximate their values using randomly selected projections. 

On the other hand, in [6], a random depth is defined. In this paper, given a point x, 
the authors propose to choose at random a finite number of vectors Vi, ...,Vk, and then, 
take as depth of x the mean of the values Di(Ily-{x), P o 11"^), i = 1, k. 

Our approach follows more closely the suggestion in [20]: We simply replace the infi- 
mum in ([2]) by a minimum over a finite number of randomly chosen projections. 

Definition 1.1 Let P be a probability distribution on MF. Let x G MF, k & IN and let v 

be an absolutely continuous distribution on MF. The random Tukey depth of x with respect 
to P based on k random vectors chosen with v is 

DT,kAx, P) = min{Di(n,,(x), P o n-i) : 2 = 1, k}, x G 



2 



where Vi, ...,Vk are independent and identically distributed random vectors with distribution 

V. 



Obviously, DT^k,u{Xi P) is a random variable. It may seem a bit strange to take a 
random quantity to measure the depth of a point, which is inherently not-random. We 
have two reasons to take this point of view. 

Firstly, Theorem 4.1 in |4j shows that if P and Q are probability distributions on W , 
V is an absolutely continuous distribution on W and 

i.{t;GMP:Pon;i=Qon;i}>0, 

then P = Q. In other words, if we have two different distributions, and we randomly 
choose a marginal of them, those marginals are almost surely different. In fact, it is also 
required that at least one of the distributions is determined by their moments, but this 
is not too important for the time being. According to this result, one randomly chosen 
projection is enough to distinguish between two p-dimensional distributions. Since the 
depths determine one-dimensional distributions, a depth computed on just one random 
projection allows to distinguish between two distributions. 

Secondly, if the support of z/ is M^, and, for every k, {vi, ...,ffc} C {vi, ...,Vk+i}, then 

DT,kA^,P)>DT,k+iA^,P)^DT{x,P), a.s. (3) 

Therefore, if we choose a large enough k, the effect of the randomness in DT^k,u will be 
negligible. Of course, the question of interest here is to learn how large k must be, because 
values of k that are too large would make this definition useless. 

One way to select k is to compare Dt and Dj'^k,u, but the long computation times 
required to obtain Dt make those comparisons unpractical. Instead of this, we have 
decided to choose a situation in which the deepness of the points are clearly defined and 
can easily be computed with a different depth. 

If P is an elliptical distribution with centralization parameter /i and dispersion matrix 
S, then, it seems that every reasonable depth should consider /i as the deepest point, 
that points at the same Mahalanobis distance of fi should have the same depth, and 
that differences in depth should correspond with differences in Mahalanobis distance of /i. 
Then, in this situation, every depth should be a monotone function of the Mahalanobis 
depth [13], where, given x e M^, this depth is 

Therefore, we can choose the right k in DT^k,u as follows: If P is elliptical, Dx{-,P), 
is a monotone function of Dm{-,P)- Thus, from (|3]), the larger the k, the larger the 
resemblance between DT,k,u{-,P) and a monotone function of Dm{-,P)- However, there 
should exist a value k^ from which this resemblance starts to stabilize. This is the value 
for k we are looking for. 
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However, in practice, we do not know P, and we only have a random sample. It seems 
that the selection of ko should take this fact into consideration. In Section |2] we present 
a procedure to do this. 

The results of the comparison of Dm{-,P) and DT,k,u{-,P) for several sample sizes, 
dimensions and elliptical distributions are shown in Table |2T According to this table 



A; = 36 is the maximum number of directions required if the sample size is below 1,000. 

Once the right values of k have been fixed, we carry out, also in Section [2| a study 
to compare Dx^k^u with Dt from the applications point of view. The results are quite 
encouraging. 

Section |2] ends with a comparison of the time required to compute DT,k,u and the time 
to compute Dm- This comparison turns out to be favorable for DT,k,v 

An important advantage of Definition |1.1| is that it can be applied in every space in 
which projections can be computed. Since this is an easy task in Hilbert spaces and 
Theorem 4.1 in holds in separable Hilbert spaces, we propose to employ Definition |1.1| 
to compute depths of points in those spaces. 

A difference with the p-dimensional case is that here we are not aware of any situation 
in which a gold standard to compare depths does exist. However, in [I2|, authors employ 
functional depths in a classification problem. In Section|3| we compare the results obtained 
with the random depth with those obtained in [12] in the same problem. 

This study reinforces the feeling that the values obtained in Table 2A_ are accurate. 
Following this table, in Section [3} we have taken /c = 10 because the sample sizes are 
around 50. The results have been satisfactory even if there is no reason to assume any 
particular model on the distribution generating the samples. 

Some other functional depths (not considered here) have been proposed in the litera- 
ture. We are aware of the Fraiman-Muniz depth (introduced in [8]), the h-mode depth 
(proposed in 0) and the above mentioned random depth and a double random depth 
(RPD) which appear in [6]. An interesting application of those depths to outlier detec- 
tion is made in [7j. 

In [6], the authors apply the depths that they analyze to the same classification problem 
that we study here. The proportions of right classifications that they obtain with depths 
are similar to those reported here except for the RPD depth. This is a random depth 
which takes into account not only the curves but also their derivatives. Thus, it handles 
more information than we employ here, and the results are not comparable. 

We want to mention that Theorem 4.1 in |1] provides the theoretical background for 
the random depth proposed in [6] whose definition, in fact, only considers a vector. The 
only reason the authors give for handling k {> 1) randomly chosen vectors is to provide 
more stability to the definition. Moreover, Theorem 4.1 in |1] has also been applied to 
construct goodness of fit test, for instance, in [1], [2] and [3]. In those papers, the authors 
also handle more than one projection. They take k ranging from 1 to 25 in [1], ranging 
from 2 to 40 in [2], and k = 100 in [3] with the same objective as in [6] and also with no 
specific reason to make those selections. 

We consider that the results provided in this paper could help to settle the way in 
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which the number of random projections should be chosen. 

Computations have been carried out with MatLab. Computational codes are available 
from the authors upon request. 

2 How many random projections? Testing homogeneity 

In this section we analyze the p-dimensional case. Obviously, Theorem 4.1 in |4j also holds 
if we take v a probability distribution absolutely continuous with respect to the surface 
measure on the unit sphere in W . We are also interested in what ([s]) holds. Then, in this 
section, we fix v to be the uniform distribution on the unit sphere, and we will suppress 
the subindex v in the notation -Dt.A;- 

As stated in the introduction, to decide how to choose fc, we will analyze the case in 
which P is an elliptical distribution by comparing the functions Dm{-,P) and DT,k{-,P) 
for several values of k. 

Taking into account that depths only try to rank points according to their closeness 
to the center of P, it is reasonable to measure the resemblance between Dx^ki'tP) and 
Dm{-,P) looking only at the ranks of the points. This is equivalent to employing the 
Spearman correlation coefficient, p. Thus, the resemblance that we handle here is 

rk,p := P {DtAX, P),Dm{X, P)) , (5) 

where X is a random variable with distribution P. 

If P is an elliptical distribution, then the function k Vk^p is strictly increasing. We 
try to identify k with the point ko from which the increments become negligible. 

Moreover, in practice, we will not have a distribution P, but a random sample xi, x„ 
taken from P. This leads us to replace P in ([5]) by the empirical distribution P„ (Pn[^] = 
n {xi, ...,Xn})/n) which does not follow exactly the model and, consequently, the 
function is not necessarily increasing. We propose is to identify /cq with the point in 
which Tk p^ starts to oscillate or, more precisely, estimate /eq by 

ko = inf {A; > 1 : rfc,p„ > rk+i,p„}. 

To check the dependence between k^ and the underlying distribution, we employ sam- 
ples taken from multidimensional standard Gaussian distributions, from distributions with 
independent double exponential marginals and with independent Cauchy marginals. We 
are also interested in looking at the dependence between /cq and the dimension of the space 
and the sample size. To do this, we have selected five dimensions {p = 2, 4, 8, 25, 50), and 
six sample sizes (n = 25, 50, 100, 250, 500, 1, 000). 

We need to compute the location center and the dispersion matrix of P„ to be employed 
in Dm- Those parameters should depend on the distribution which have generated the 
sample. We mean the following: the covariance matrix is an appropriate parameter in 
the Gaussian and exponential case. But it is not adequate for the Cauchy distribution, 
where, we have identified S with the robust covariance matrix proposed in [H], page 206. 
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On the other hand, we have replaced /x by the sample mean in the Gaussian case and by 
the coordinate-wise median in the exponential and Cauchy settings. 



We have done 10,000 simulations under each set of conditions. In Table 2A_ we show 
the mean and the 95% percentile of the values obtained for ko. 

Table 2.1 Mean and 95% percentile of the optimum values for the number of required 
random projections k for the sample sizes, dimensions and distributions shown. 

Symbol * means that Mahalanobis depth is not defined in those cases because the dis- 
persion matrix is degenerated. 



Dimension 


Distribution 




9^; 
zo 


ou 


Sample sizes 
100 250 


OUU 


1 nnn 

i ,UUU 


p = 2 


Gaussian 


mean 
95% percentile 


3.61 




3.90 


4.17 

8 


4.19 

8 


4.23 

Q 
O 


4.20 

Q 
O 




D. Exp one. 


mean 


O AO 


6.1 I 


3.97 


4.18 


A OO 

4.00 


AAA 

4.44 






yo/0 peiceuLiie 


u 


7 


7 


8 


Q 
y 


Q 
y 




Uaucny 


mean 


O.ZO 


O.OO 


3.81 


4.06 


4.00 


A QO 






yo/o peicenLiie 


u 


u 


7 


8 


Q 

y 


1 1 
1 1 


p = A 


Gaussian 


mean 


4.06 


5.53 


7.19 


9.12 


9.99 


10.61 






95% percentile 


'-7 
I 


10 


12 


16 


lo 


on 
zU 




D. Exp one. 


mean 


one; 


o.zo 


6.69 


8.49 


n on 
9.30 


n no 
9.93 








7 


9 


12 


15 


1 7 


1 9 




7^ u 

Gauchy 


mean 


Q e; 


AAA 
4.44 


5.48 


6.66 


7/10 

I .4:0 


Q nn 






c/tJ/0 pel CcllLllt; 


u 


S 
o 


1 n 

iU 


1 9 


14 




p = 8 


Gaussian 


mean 


3.45 


4.80 


6.91 


11.56 


15.80 


20.18 






yo/0 peicenxue 


P. 




Q 

y 


13 


20 


97 


OO 




D. Exp one. 


mean 


3.76 


5.11 


7.53 


11.98 


15.80 


19.41 






95% percentile 


7 


9 


13 


20 


27 


34 




Gauchy 


mean 


3.58 


4.64 


6.23 


8.53 


10.06 


11.43 






95% percentile 


6 


9 


11 


16 


19 


23 


p = 25 


Gaussian 


mean 


* 


3.05 


4.25 


6.53 


10.07 


16.04 






95% percentile 


* 


6 


9 


14 


20 


29 




D. Exp one. 


mean 


* 


3.84 


5.27 


8.93 


14.08 


21.64 






95% percentile 


* 


7 


10 


17 


25 


36 




Gauchy 


mean 


* 


4.74 


6.67 


10.22 


13.54 


16.50 






95% percentile 


* 


9 


12 


18 


24 


30 


p = 50 


Gaussian 


mean 


* 


* 


3.16 


4.71 


6.55 


9.96 






95% percentile 


* 


* 


7 


10 


14 


20 




D. Exp one. 


mean 


* 


* 


4.17 


6.48 


9.88 


15.64 






95% percentile 


* 


* 


8 


13 


19 


28 




Gauchy 


mean 


* 


* 


6.75 


10.94 


15.16 


19.61 






95% percentile 


* 


* 


12 


19 


26 


34 



6 



Since, in each simulation, the obtained value of ko is bounded from below by 1 and 
it can take arbitrary large values, the distribution of ko is right skew. Thus, the mean 
produces larger values than the median giving some guaranty against the possibility to 
selecting values that are too low. Moreover, even if the mean could be a reasonable 
selection, we chose the 95% percentile for additional safety. 



It is apparent from Table 2.1 that, in every dimension, the optimum value for k increases 
with the sample size. This increment is due to the fact that when n increases, the function 
rk,p„ approaches r^^p, which is strictly increasing. In other words, when we take a random 
sample, the randomness introduces some noise in the model which makes taking high 
values of k useless. However, when n increases, this randomness is lower, and, then, it is 
worth it to increase k. 

The variation of the optimum k with the dimension is more striking. If we fix a 
sample size, in the Gaussian and exponential case, it happens that the optimum value, 
first increases with p but, after a change point, it decreases. However, the change point 
increases with the sample size. 

It seems obvious that the number of projections required to accurately represent a cloud 
of points should increase with p. But, why does this number decrease after the change 
point? The answer lies in the noise introduced by the randomness in taking the sample. 
The problem is that we are comparing the random Tukey depth with the Mahalanobis 
one. But, in order to compute Dm we need to estimate the dispersion matrix and, if we 
keep the sample size fixed, this estimation worsens when the dimension increases. The 
noise introduced by this fact makes considering high values for k useless. We will briefly 
analyze this point. 

Let us consider the Gaussian case. In order to figure out how good the estimator 
sample covariance is depending on the sample size and the dimension of the space, we 



show in Table 2.2 the mean of the determinants of the sample covariance matrices for the 



same sample sizes and dimensions as in Table [2TT] obtained along 1,000 simulations taken 
from a standard Gaussian distribution. 

Table 2.2 Mean of the determinants of the sample covariance matrices computed in 1,000 
random samples taken from a standard Gaussian distribution. 



Dimension 


25 


50 


Sampk 
100 


3 sizes 
250 


500 


1,000 


p = 2 


.953 


.982 


.990 


.998 


.998 


.999 


p = 4 


.771 


.884 


.944 


.973 


.988 


.994 


p = 8 


.273 


.546 


.749 


.891 


.946 


.973 


p = 25 


* 


.001 


.037 


.288 


.546 


.736 


p = 50 


* 


* 


.000 


.005 


.079 


.289 



The comparison of Tables 2.1 and 2.2 shows that the 95% percentiles shown in Table 
12.11 for the standard Gaussian distribution increase while the mean of the determinant of 
the covariance matrix in Table 2.2 is above (roughly speaking) .750 and starts to decrease 
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when the determinant is below this quantity. The same behavior can be observed in the 
Double Exponential case. 

With respect to the Cauchy case, the mean of the determinant of the employed estimate 
of the dispersion matrix never falls below this threshold, for the considered sample sizes 
and dimensions. 

Precisely, this difference in the behavior between the Cauchy and the other distributions 
makes some differences appear if we compare the results between distributions with the 
same sample size and dimension, but, in most cases, those differences are not significantly 
large. Except when dimension is p = 50, the rate between the largest and the lower 95% 
percentiles lies between 1 and 1.5. However, it is noticeable that, if for each sample size 
and distribution, we take the largest 95% percentile along dimensions, we obtain that 
the differences between the largest and the lower values are just 1 except for sample size 
n = 1,000 in which this difference is 2. 

Thus, we propose to choose k, for a fixed dimension, as the maximum of the 95% 
percentiles along distributions and sample sizes. However, if we prefer to fix the sample 
size, then we propose to take the maximum on dimensions and distributions. Either way, 
to have a full guarantee it is only required to take the maximum in the table which gives 
the (surprisingly low) value k = 36. 

In this point the initial Hand's phrase is in force. There is no doubt that, theoretically, 
the accuracy of the depth improves if k increases. However, for a fixed sample size, the 
noise coming from the sampling process makes large values for k useless. This point is 
reinforced in the following subsection where we compute Dt, with p = 2, using 1,000 
vectors with no practical gain. 



2.1 Testing homogeneity 



Our goal in this subsection is to show that the values obtained for k in Table 2.1 give depths 
which provide results similar to those obtained in practice with the Tukey depth. To this 
end, we are going to reproduce the simulation study carried out in ^Ij , where the authors 
apply depth measures to test differences in homogeneity between two distributions. Let us 
begin by giving a brief description of the problem and the procedure. Additional details 
can be found in pT] . 

Assume that we have two random samples {Xi, ...,Xn^} and {Yi, ...jYn^} taken from 
the centered distributions P and Q respectively. Let us assume that those distributions 
coincide except for a scale factor, i.e., we are assuming that there exists r > such 
that the r.v.'s {rXi, rX„^} and {Y"!, l^j} are identically distributed. The problem 
consists in testing the hypotheses: 

Ho : r = 1 (both scales are the same) 
Ha : r > 1 {Q has a larger scale). 

The idea is that, under the alternative, the observations in the second sample should 
appear in the outside part of the joint sample {Xi, X^^, Yi, Yn^}, and, consequently. 
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should have lower depths than the points in the first sample. Thus, it is possible to test 
Hq against Ha by computing the depths of the points {Yi, ...,Yn2} in the joint sample, 
replacing them by their ranks and rejecting Hq if those ranks are small. 

The Wilcoxon rank-sum test can be used to test when the ranks of the points {Yi, Yn,^} 
are small. In [11] several possibilities to break the ties are proposed. We have tried all of 
them, with no important differences. Thus, we have selected to break the ties at random 
as the only method to be shown here. 



In Table 2.3 we show the rate of rejections under the exposed conditions when we carry 
out the test at the significance level a = .05. The table also includes, between parenthesis, 
the rejection rates when the random depth is replaced by the Tukey depth using 1,000 
directions uniformly scattered on the upper halfspace. 

The distributions used in the simulations are the 2-dimensional standard Gaussian, 
and the double exponential and Cauchy with independent marginals. We have centered 
the samples from the Gaussian distribution in mean and the samples from the double 
exponential and the Cauchy distributions in component-wise median. We have considered 
the values r = 1, 1.2,2, and m = n2 = n with n E {20,30, 100}. We have done 10,000 
simulations for each combination of distribution, sample size and r. 

Concerning the value of k for the random depth, since we have to compute random 
depths in samples with sizes 2n G {40, 60, 200} we have chosen k = 6,7 and 8 respectively. 
Those values are close to the suggestions in Table 2A for p = 2 and the corresponding 
sample sizes. We have not followed the hints at the end of the previous section because 
we are interested in seeing the behavior of the procedure with k as low as possible. 

Table 2.3 Rate of rejections in 10,000 simulations using Dx^k with k as shown (between 
parenthesis, the rate with Dt ) for the considered distributions, sample sizes and values of 
r. Dimension is p = 2. The significance level is .05. 



Sample size 


Scale factor 






Distribution 






Cauchy 


Gaussian 


D. exponential 


n = 20 


r = 1 


.054 


(.057) 


.059 


(.061) 


.053 


(.056) 


k = 6 


r = 1.2 


.125 


(.125) 


.249 


(.259) 


.174 


(.177) 




r = 2 


.556 


(.552) 


.963 


(.963) 


.833 


(.824) 


n = 30 


r = 1 


.051 


(.048) 


.051 


(.052) 


.052 


(.053) 


k = 7 


r = 1.2 


.140 


(.148) 


.316 


(.325) 


.219 


(.214) 




r = 2 


.691 


(.699) 


.995 


(.996) 


.940 


(.941) 


n = 100 


r = 1 


.086 


(.055) 


.055 


(.057) 


.050 


(.051) 


A; = 8 


r = 1.2 


.297 


(.300) 


.719 


(.720) 


.507 


(.514) 




r = 2 


.991 


(.994) 


1 


(1) 


1 


(1) 



In [TT] previous ideas are also applied to check the homogeneity between K sam- 
ples, K > 2. The problem is the following. Let {Xn, Xi „j}, {X^^ i, „^} be 
random samples obtained, respectively, from the distributions Pi,...,Pk and let us as- 
sume that there exist ri, ...,rK-i > such that the random vectors riXi i, riXi „j,.... 
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tk-iXk-i,!, ...,rK-iXK~i,nK-iJ Xk,i, ...,XK,nK are identically distributed. 
We are interested in testing the following hypotheses: 

Hq : ri = l,i = 1, K — 1 (all scales are the same) 
Ha : there exists 7^ 1 (scales are different). 

If we center separately each sample, join all the observations in a unique sample, 
compute the depths of all the points and transform those depths in ranks, then, we can 
apply the Kruskal-Wallis test to check if there are lacks of homogeneity between the ranks 
in each sub-sample. 

We have carried out a simulation study applying previous procedure to the Tukey depth 
and to the random Tukey depth in the 2-dimensional case with Gaussian distributions, 
K = 3 and sample sizes ni = n2 = = n, where n G {20,30}. We have carried out 
10,000 replications in each case at the significance level a = .05. 

Concerning the selection of k, we have to compute the depths of points in samples 
with sizes 3n = 60, 90. Thus, according to Table 2J^, we have taken k = 7 and 8 random 
directions to project. 



Results are shown in Table |2.4[ where we also include between parenthesis the results 
applying the same procedure with the Tukey depth. 

Table 2.4 Rate of rejections in 10,000 simulations using Dx^k with k as shown (between 
parenthesis the rate with Dt) to test the homogeneity in three samples of Gaussian dis- 
tributions with independent, identically distributed marginals and the exposed values of r. 
Dimension is p = 2. The significance level is .05. 



Covariance matrices 


Sample sizes 
n = 20 and k - 


and random directions 
= 7 n = 30 and k = 8 


ri = r2 = I 


.05 


(.05) 


.05 (.06) 


Ti = r2 = 1.2 


.16 


(.15) 


.21 (.21) 


n = 2, r2 = 1.2 


.89 


(.89) 


.98 (.98) 


ri =r2 = 2 


.96 


(.97) 


1 (1) 



The results of both studies in this subsection are quite encouraging, because there are 
no important differences among the rejection rates with both depths in spite of the big 
differences on the employed number of directions. 

2.2 Computational time 

We end this section paying some attention to the required computational time to compute 
the random Tukey depth. As a comparison we have selected the time to compute the 
Mahalanobis depth which is one of the quickest depths according to Table 1 in jT5j . 



In Table 2.5 we present the mean time, along 200 simulations, employed to compute 



the random Tukey and Mahalanobis depths for all points in a sample with the shown 
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sizes and dimensions. The number of employed random directions correspond with those 
obtained in Table 12.11 

Since the random Tukey and Mahalanobis depths are computed on the same samples, 
the first depth to be computed may have an advantage in that the RAM memory may be 
cleaner than when the second one is computed. In order to avoid this, we have computed 
the random Tukey depth first 100 times and the Mahalanobis depth first 100 times. 

The computations have been carried out on a computer Xserve G5, PowerPC G5 Dual 
2.3 GHz and 2Gb of RAM memory. 



Table 2.5 Time, in seconds, to compute the random Tukey and the Mahalanobis depths 
(between parenthesis) of all points in a sample with size n taken from a standard Gaussian 
distribution. 



Dimension Random 
vectors 


Sample size 
n = 100 n = 500 n = 1,000 


p = 2 A; = 8, 9, 11 


5.9344-10-^ (.0027) .0025 (.0091) .0060 (.0176) 


p = 4 A; = 12, 18, 20 


8.0011 ■ 10"^ (.0026) .0064 (.0094) .0144 (.0178) 


p = 8 A; = 13, 27, 35 


8.5459-10-^ (.0027) .0119 (.0098) .0334 (.0185) 


p = 25 A; = 12, 25, 36 


8.3209-10-^ (.0031) .0104 (.0111) .0356 (.0220) 


p = 50 A; = 12, 26, 34 


8.9296 ■ 10"^ (.0048) .0116 (.0161) .0325 (.0296) 



The main computational effort to compute the Mahalanobis depth is devoted to obtain 
the inverse of the covariance matrix. In consequence, the computational time for this 
depth converges to infinity with the dimension. 

On the other hand, the main difficulty in computing the random Tukey depth is ob- 
taining the projections of the involved points. Thus, the main increment in required time 
for the random Tukey depth comes from the increment in k. Taking into account that, 
according to Table 2A_, k, as a function of p, is bounded, the required time to compute 
random depths of a sample should not increase as quickly. This is made apparent in 
Table 2^ where, except for n = 100, the maximum computation time is not attained in 
the highest dimension. 



3 Functional random Tukey depth. Functional classification 

An interesting possibility of the random Tukey depth is that it can be straightforward 
extended to functional spaces. The only requirement of the main results in [4j is that the 
sample space has to be a separable Hilbert space. Thus, in this section we will assume 
that we are considering a distribution P defined on this kind of space. 

Concerning the number of random directions to take, it is possible to consider the 
infinite dimensional Hilbert spaces as the limit of finite dimensional euclidean spaces, 
and, then, given a sample size n, it is enough to take k as the maximum of the values 



provided in Table 2.1 for this sample size. 



11 



For the reasons given in the introduction, we will directly check how this depth works in 
practice. To this end, we have repeated the classification problem carried out in [12], where 
the authors handle a data set consisting of the growth curves of a sample of 39 boys and 54 
girls, with the goal to classify them, by sex, using just this information. Heights were mea- 
sured at 31 times in the period from one to eighteen years. The data were taken from the 
file growth.zip, downloaded from f t p : / / ego . psych . mcgill . ca/ pub / r amsay / FDAf uns / Matlab[ 
The data are drawn in Figure 3.1[ 




5 

^ 120 




Figure 3.1 Growth curves of 54 girls (left hand side) and 39 hoys (right hand side) 
measured 31 times each between 1 and 18 years. 

It is well known that when handling this kind of data, it is useful to consider not only 
the growth curve but also accelerations of height (see, for instance, [16j). However, since 
we are mainly interested in comparing our results with those ones obtained in [12] , where 
only growth curves were considered, here we will do the same. Indeed, we will repeat the 
study with three differences: 

1. Most importantly, we will replace the functional depths handled there by the random 
Tukey depth. 

2. In [12] the authors consider the curves as elements in L^[0, 1], which is not possible 
here, because we need a separable Hilbert space. 

We will assume that M is the space of square-integrable functions in a given interval 
which, after re-scaling, we can assume to be [0,1]. Thus, M = L^[0,1] and given 
f,g e M we have that (/, g) = f{t)g(t)dt. 

3. In [T2l|, the authors smoothed the original data using a spline basis. We have skipped 
this step because it did not seem necessary to us. 

The classification procedure can be extended to an arbitrary number of groups, but, 
just to keep the notation as simple as possible, we will assume that we have just two 
groups. Thus, let us assume that we have two samples and Yi,....,Ym in IH 
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selected from two populations and that we are interested in classifying another curve 
Z e iff in one of those groups using a depth D to be chosen later. Three classification 
methods are proposed in [12j : 

1. - Distance to the trimmed mean (M) 

Compute the depths of the points in the sample Xi, Choose a G [0, 1). The 

a-trimmed mean of this sample, /^^(X), is the mean of the n x (1 — a) deepest points. 
Given (3 G [0, 1), compute similarly jJipiY) the /3-trimmed mean of the sample Yi, Ym- 
Now, we classify Z in the first group if 

< \\Z-^ip{Y)\\. 

Otherwise we classify Z in the second group. 
When applying this method, a = [3 = .2. 

2. - Weighted average distance (AM) 

In some sense, in method M, each group is represented by its trimmed mean. Here, 
we compute the distance between Z and the group as a weighted mean of the distances 
between Z and the members of the group where the weights are the depths of the points. 

Thus, we would classify the function Z in the first group only if 

Y:U\\z-x4Dx{x,) ET=i \\Z-YADy{Y,) 

where the subscripts in Dx and Dy mean that the depths are computed with respect to 
the empirical distribution associated to the corresponding sample. 

3. - Trimmed weighted average distance (TAM) 

In the AM method, the result of the classification could be affected by the number of 
elements in each sample if n 7^ m. The solution for this consists of taking a third value 
/ < min(n, m) and replacing ^ by 

EU||^-X(,)||Dx(X(,)) ^ ElillZ-Y^MlDvin^)) 

where X(i) is the deepest point in the X-sample, X(2) is the second deepest point in the 
X-sample,... and similarly for the F-sample. 

In [12] the authors consider three possibilities to split the sample in training and 
validation sets. We have analyzed all three possibilities, but in order to shorten the 
exposition we will only present the results corresponding to the cross-validation setting. 
However, we want to remark that, when using the random Tukey depth, the differences 
between the error rates obtained with those possibilities are less important than those 
reported in [T^ . 
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Regarding the selection of k, since the bigger sample size is around 50, following the 
suggestion at the beginning of this section, we have taken k = 10. 

In Table |3.2| we show the obtained failure rates using the described methods, the 



random Tukey depth and the depths proposed in [12] . The last three columns contain the 
error rates obtained with the depths handled in [12] . They are the band depth determined 
by three different curves (DS3), by four different curves (DS4) and the generalized band 
depth (DGS). Their values have been taken from Tables 1-3 in |12j . 

On the other hand, taking into account the random nature of the proposed depth, we 
have tried 10,000 times each classification method with the random Tukey depth. The 
second column in Table 13.21 contains the rate of errors we have obtained. 

To facilitate comparisons, we present in bold the lowest failure rate for each method. 

Once again, in spite of the low number of random projections, the results are similar 
to those in , the random Tukey depth with the AM method being the global winner. 



Table 3.2 Rate of mistakes when classifying the growth curves by sex using cross valida- 
tion for the shown methods and depths. 



Classification 


Random Tukey 


Depths 


proposed 


in [T2] 


method 


k = 10 


DS3 


DS4 


DGS 


M 


.2033 


.1828 


.1828 


.1613 


AM 


.1485 


.2473 


.2473 


.1935 


TAM 


.1651 


.2436 


.2436 


.1690 



4 Discussion 

In this paper we introduce a random depth which can be considered as a random ap- 
proximation to the Tukey depth. The new depth is interesting because of the little effort 
required into its computation and because it can be extended to cover Hilbert valued 
data. 

From a theoretical point of view, this random depth enjoys no new properties. Its 
interest lies in the fact that taking very few one-dimensional projections, it is possible 
to obtain similar results to those obtained with more involved depths. The number of 
required projections is surprisingly low, indeed. In fact, for samples sizes smaller or equal 
to 1,000, it seems that 36 projections suffice for every dimension. 

If the dimension of the space is fixed, the number of required projections increases 
with the sample size. This dependence is related to the fact that when the sample size is 
small, the randomness included in the sample makes the gain achieved by considering a 
high number of projections useless. 

On the other hand, if we fix the sample size, the number of required projections 
increases with the dimension until the point in which the dimension is too large to allow 
a reasonable estimation of the underlying distribution. From this point on, the number 
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decreases. The initial increment is related to the course of the dimensionality. The later 
decrement is due to the uncertainty on the parent distribution which makes a large number 
of projections useless. 

We consider remarkable the fact that a not too high number of projections provides very 
good results even in the infinite dimensional setting. This is shown in the comparisons 
with some other depths that we have carried out. Those studies, do not show really 
important differences between the considered depths and the random Tukcy one. Thus, 
we conclude that, at least under considered conditions, the random Tukey depth is an 
alternative which is worth considering because of the small computational time required. 

References 

[1] Cuesta-Albertos, J., Cuevas, A. and Fraiman, R., 2007. A test for directional unifor- 
mity with applications to high-dimensional sphericity. Preprint. 

[2] Cuesta-Albertos, J., del Barrio, T., Fraiman, R. and Matran, C, 2007. The random 
projection method in goodness of fit for functional data. Computat. Statist. Data 
Anal., 51 4814-4831. 

[3] Cuesta-Albertos, J., Fraiman, R. and Ransford, T., 2006. Random projections and 
goodness-of-fit tests in infinite-dimensional spaces. Bull. Brazilian Math. Soc, 37(4) 
1-25. 

[4] Cuesta-Albertos, J., Fraiman, R. and Ransford, T., 2007. A sharp form of the Cramer- 
Wold theorem. J. Theoret. Probab., 20 201-209. 

[5] Cuevas, A., Febrero, M. and Fraiman, R., 2006. On the use of the bootstrap for esti- 
mating functions with functional data. Computation. Statist. Data Anal., 51 1063- 
1074. 

[6] Cuevas, A., Febrero, M. and Fraiman, R., 2007. Robust estimation and classification 
for functional data via projection-based depth notions. To appear in Computation. 
Statist. 

[7] Febrero, M., Galeano, P. and Gonzalez- Manteiga, W., 2007. Outlier detection in 
functional data by depth measures with application to identify abnormal NOx levels. 
To appear in Environmetrics. 

[8] Fraiman, R. and Muniz, G., 2001. Trimmed means for functional data. Test, 10 
419-440. 

[9] Hand, D.J., 2006. Classifier technology and the illusion of progress. Statist. Sci., 21(1) 
1-14. 

[10] Liu, R., Serfling,R. and Souvaine, D.L., editors, 2006. Data Depth: Robust Multivari- 
ate Analysis, Computational Geometry and Applications. American Mathematical 
Society, DIMACS Series, Vol. 72. 



15 



[11] Liu, R.Y. and Singh, K., 2006. Rank tests for nonparamctric description of dispersion. 
In: R. Liu, R. Serfling and D.L. Souvaine (Ed.), Data Depth: Robust Muhivariate 
Analysis, Computational Geometry and Applications, American Mathematical Soci- 
ety, DIMACS Series, Vol. 72, 17-35. 

[12] Lopez-Pintado, S. and Romo, J., 2006. Detph-based classification for functional data. 
In: R. Liu, R. Serfling and D.L. Souvaine (Ed.), Data Depth: Robust Multivariate 
Analysis, Computational Geometry and Applications, American Mathematical Soci- 
ety, DIMACS Series, Vol. 72, 17-35. 

[13] Mahalanobis, R C, 1936. On the Generalized Distance in Statistics. Proceed. Nat. 
Academy of India, 12 49-55. 

[14] Maronna, R.A., Martin, R. D. and Yohai, V.J., 2006. Robust Statisticcs. Theory and 
Methods. John Wiley & Sons, Chichester. 

[15] Mosler, K. and Hobcrg, R., 2006. Data analysis and classification with the zonoid 
depth. In: R. Liu, R. Serfiing and D.L. Souvaine (Ed.), Data Depth: Robust Multi- 
variate Analysis, Computational Geometry and Applications, American Mathemati- 
cal Society, DIMACS Series, Vol. 72, 17-35. 

[16] Ramsay, J.O. and Silverman, B.W., 1997. Functional Data Analysis. Springer Verlag, 
New York. 

[17] Tukey, J.W., 1975. Mathematics and picturing of data. Proceedings of ICM, Vancou- 
ver, 2 523-531. 

[18] Zuo, Y. and Serfling, R., 2000. General notions of statistical depth function. Ann. 
Statist., 28(2) 461-482. 

[19] Zuo, Y., 2003. Projection-based depth functions and associated medians. Ann. 
Statist., 31(5) 1460-1490. 

[20] Zuo, Y., 2006. Multidimensional trimming based on projection depth. Ann. Statist., 
34(5) 2211-2251. 



16 



